Research Statement - Ronen Feldman
نویسنده
چکیده
The information age has made it easy to store large amounts of data. The proliferation of documents available on the Web, on corporate intranets, on news wires, and elsewhere is overwhelming. However, while the amount of data available to us is constantly increasing, our ability to absorb and process this information remains constant. Search engines only exacerbate the problem by making more and more documents available in a matter of a few key strokes. Text Mining is a new and exciting research area that tries to solve the information overload problem by using techniques from data mining, machine learning, NLP, IR and knowledge management. Text Mining involves the preprocessing of document collections (text categorization, information extraction, term extraction), the storage of the intermediate representations, the techniques to analyze these intermediate representations (distribution analysis, clustering, trend analysis, association rules etc) and visualization of the results. My research evolves around the various components of text mining. In the following sections I will describe the various research activities that I have done in the recent years and plans for future research. My main motto in research is the combination of theory and practice and indeed in each of the following areas we have developed a complete theory and proved that it actually works in practice by implementing a large scale system based on the theory.
منابع مشابه
Computerized retrieval and classification: An application to reasons for late filings with the securities and exchange commission
This study explores a system to retrieve and classify the reasons for late mandatory SEC (Securities and Exchange Commission) filings. From the source documents, the system identifies the reasons for the late filing and classifies them into one or more of seven categories. The system can be used by potential investors who have to track a large number of filings concentrated within a day or two....
متن کاملThe Effectiveness of Feldman Multilevel Integrative Approach Training on Increase Marital Relationship Transparency and Decrease Divorce Tendency in Divorce Seeking Couples
Introduction: Divorce seeking couples have many problems in their marital relationships and one of the most effective methods to improve marital characteristics related life is Feldman multilevel integrative approach training method. Therefore, present research aimed to determine the effectiveness of Feldman multilevel integrative approach training on increase marital relationship transparency ...
متن کاملUtility of in ecursive ain Theories
We investigate the utility of explanation-based learning in recursive domain theories and examine the cost of using macro-rules in these theories. The compilation options in a recursive domain theory range from constructing partial unwindings of the recursive rules to converting recursive rules into iterative ones. We compare these options against using appropriately ordered rules in the origin...
متن کاملA Hybrid Approach to NER by Integrating Manual Rules into MEMM
This paper describes a framework for defining domain specific Feature Functions in a user friendly form to be used in a Maximum Entropy Markov Model (MEMM) for the Named Entity Recognition (NER) task. Our system called MERGE allows defining general Feature Function Templates, as well as Linguistic Rules incorporated into the classifier. The simple way of translating these rules into specific fe...
متن کاملThe Stock Sonar - Sentiment Analysis of Stocks Based on a Hybrid Approach
The Stock Sonar (TSS) is a stock sentiment analysis application based on a novel hybrid approach. While previous work focused on document level sentiment classification, or extracted only generic sentiment at the phrase level, TSS integrates sentiment dictionaries, phrase-level compositional patterns, and predicate-level semantic events. TSS generates precise in-text sentiment tagging as well a...
متن کامل